58 research outputs found

    Prise en Compte de la Structure des Documents pour la DĂ©couverte d'Informations Inattendues

    No full text
    National audienceDans cet article nous nous intéressons à la prise en compte de la structure des documents dans un processus de découverte d'informations inattendues au sein d'un corpus de documents textuels. Faisant suite à un premier travail visant à concevoir et implanter des mesures d'inattendu dans un système baptisé UnexpectedMiner, nous avons cherché à améliorer les performances de celui-ci en prenant en compte la structure des documents analysés. Chaque partie des documents est ainsi pondérée par des coefficients dont les valeurs sont déterminées par un algorithme d'optimisation. Ces coefficients sont alors intégrés dans les mesures d'inattendu utilisées par UnexpectedMiner pour déterminer si un document présente un caractère inattendu ou pas. Les performances de notre nouveau système sont évaluées et mettent en évidence les améliorations de performances induites par la prise en compte de la structure des documents

    Un cadre théorique pour la gestion de grandes bases de motifs

    No full text
    National audienceLes algorithmes de fouille de données sont maintenant capables de traiter de grands volumes de données mais les utilisateurs sont souvent submergés par la quantité de motifs générés. En outre, dans certains cas, que ce soit pour des raisons de confidentialité ou de coûts, les utilisateurs peuvent ne pas avoir accès directement aux données et ne disposer que des motifs. Les utilisateurs n'ont plus alors la possibilité d'approfondir à partir des données initiales le processus de fouille de façon à extraire des motifs plus spécifiques. Pour remédier à cette situation, une solution consiste à gérer les motifs. Ainsi, dans cet article, nous présentons un cadre théorique permettant à un utilisateur de manipuler, en post-traitement, une collection de motifs préalablement extraite. Nous proposons de représenter la collection sous la forme d'un graphe qu'un utilisateur pourra ensuite exploiter à l'aide d'opérateurs algébriques pour y retrouver des motifs ou en chercher de nouveaux

    Efficient Management of Non Redundant Rules in Large Pattern Bases: a Bitmap Approach

    No full text
    International audienceKnowledge Discovery from Databases has more and more impact nowadays and various tools are now available to extract efficiently (in time and memory space) some knowledge from huge databases. Nevertheless, those systems generally produce some large pattern bases and then the management of these one rapidly becomes untractable. Few works have focused on pattern base management systems and researches on that domain are really new. This paper comes within that context, dealing with a particular class of patterns that is association rules. More precisely, we present the way we have efficiently implemented the search for non redundant rules thanks to a representation of rules in the form of bitmap arrays. Some experiments show that the use of this technique increases dramatically the gain in time and space, allowing us to manage large pattern bases

    Correct your Text with Google

    No full text
    to appear in the Proceedings of the International Conference on Web Intelligence, IEEE 2007.International audienceWith the increasing amount of text files that are produced nowadays, spell checkers have become essential tools for everyday tasks of millions of end users. Among the years, several tools have been designed that show decent performances. Of course, grammatical checkers may improve corrections of texts, nevertheless, this requires large resources. We think that basic spell checking may be improved (a step towards) using the Web as a corpus and taking into account the context of words that are identified as potential misspellings. We propose to use the Google search engine and some machine learning techniques, in order to design a flexible and dynamic spell checker that may evolve among the time with new linguistic features

    Accurate Visual Features for Automatic Tag Correction in Videos

    No full text
    International audienceWe present a new system for video auto tagging which aims at correcting the tags provided by users for videos uploaded on the Internet. Unlike most existing systems, in our proposal, we do not use the questionable textual information nor any supervised learning system to perform a tag propagation. We propose to compare directly the visual content of the videos described by different sets of features such as Bag-Of-visual-Words or frequent patterns built from them. We then propose an original tag correction strategy based on the frequency of the tags in the visual neighborhood of the videos. Experiments on a Youtube corpus show that our method can effectively improve the existing tags and that frequent patterns are useful to construct accurate visual features

    Unsupervised Video Tag Correction System

    No full text
    National audienceWe present a new system for video auto tagging which aims at cor- recting and completing the tags provided by users for videos uploaded on the Internet. Unlike most existing systems, we do not learn any tag classifiers or use the questionable textual information to compare our videos. We propose to compare directly the visual content of the videos described by different sets of features such as Bag-of-visual-Words or frequent patterns built from them. Then, we propagate tags between visually similar videos according to the frequency of these tags in a given video neighborhood. We also propose a controlled experimental set up to evaluate such a system. Experiments show that with suitable features, we are able to correct a reasonable amount of tags in Web videos

    Teaching Experiments and Programming for Machine Learning

    No full text

    Sequence Mining Without Sequences: a New Way for Privacy Preserving

    No full text
    International audienceDuring the last decade, sequential pattern mining has been the core of numerous researches. It is now possible to efficiently discover users' behavior in various domains such as purchases in supermarkets, Web site visits, etc. Nevertheless, classical algorithms do not respect individual's privacy, exploiting personal information (name, IP address, etc.). We provide an original solution to privacy preserving by using a probabilistic automaton instead of the original data. An application in car flow modelization is presented, showing the ability of our algorithm to discover frequent routes without any individual information. A comparison with SPAM is done showing that even if we sample from the automaton, our approach is more efficient
    • …
    corecore